Balancing Act: Asynchronous Compute and Preemption
Asynchronous compute refers to a GPU's ability to perform graphics processing tasks (e.g. traditional rendering) and compute ones (physics calculations, post-processing, etc.) at the same time. It's something that's becoming more and more important as game engines advance in complexity and also with VR. It's also an area Nvidia was seen to be lacking in, with AMD's GPUs having had decent hardware-level support for it thanks to their multiple Asynchronous Compute Engines.
Click to enlarge
With Maxwell, the GPU would partition itself so that one part handled graphics and the other part compute. However, for a given workload, these partitions were static and if one portion completed before the other (the most likely scenario), that part of the GPU would then remain idle until the rest of the work was finished, wasting resources and reducing performance. Pascal addresses this by introducing hardware-level Dynamic Load Balancing, allowing any portion that becomes idle to take on the remaining workload, putting the whole GPU to work as opposed to one fixed, limited partition.
Another important element of asynchronous compute is preemption i.e. the ability to interrupt an existing workload so that a time-critical workload can be executed as soon as possible. Doing this ideally involves saving the existing workload's progress to cache as fast as possible to make way for the incoming one. This usually takes time, however, as the existing workload typically needs to finish a certain amount of in-progress work before anything can be saved.
Click to enlarge
Pascal, however, introduces the world to pixel-level preemption. The various units have a precise means of tracking their own progress, such that once a preemption request is issued, the position of each draw call and triangle is saved. Any rasterised pixels will finish shading, after which the GPU can switch to a new workload in less than 100 microseconds. If it's a compute workload, preemption occurs at the thread level (also very low) or at the instruction level (the lowest possible) if it's a CUDA compute task.
The main application of this feature relevant to gamers is in VR, specifically asynchronous time warp, which is where a rendered image is quickly distorted a little right before the display refresh to bring it in line with the most up-to-date position of the headset and reduce perceived latency. Pixel-level preemption allows this command to be issued much closer to when it needs to be executed, cutting down on potential idle time.
Click to enlarge
Click to enlarge
You may have noticed that many of the improvements in Pascal seem geared towards VR, and this is true. While Nvidia says the GTX 1080 is roughly 1.7x faster in traditional games than the GTX 980, the additional architectural advances described over the last couple of pages mean that in VR applications it is closer to 2.7x faster.
Want to comment? Please log in.